Donations gladly accepted
If you're new here please read PerlMonks FAQ and Create a new user.
Want Mega XP? Prepare to have your hopes dashed, join in on the: poll ideas quest 2007 (10238 days remain)
|
New Questions
|
memory-efficient hash kind for incremental sort
on Jan 06, 2009 at 19:57
|
3 direct replies
|
by iaw4
|
|
|
dear monks---I have an odd need. I want to do an incremental search on words that sit in many files. so, first I form a hash, such as
michael => "file1.txt"
mike => "file1.txt,file30.txt"
...
now, I would like to see all keys matching a subset, such as
my %mi_results = $myhashlike{ "mi*" };
this is not hard if the hash is small. first, put all the keys into an array, then do a grep-match on the keys, and then extract the results from %myhashlike.
unfortunately, I may have up to 300 million words (keys) from 30,000 files in my hash.
what's a good solution for this sort of problem? are there data bases that allow regex key searches that would be suitable (esp. if they can cache intelligently)? any perl solutions? is there such a thing as a memory-efficient (say, read-only squeezed) hash?
advice appreciated.
/iaw
|
[Offer your reply]
|
Typos in perl documentation - who to tell?
on Jan 06, 2009 at 05:05
|
1 direct reply
|
by missingthepoint
|
|
|
Greetings, monks.
In my copy of perldoc base (ActivePerl 5.10.0) one line is missing a tab/4 spaces. This is minor but I had to read the example again to grasp it (and it's a trivial example). To whom or where should I report this? I'd fix it myself if I had access.
Thanks for any enlightenment. :)
Life is denied by lack of attention,
whether it be to cleaning windows
or trying to write a masterpiece...
-- Nadia Boulanger
|
[Offer your reply]
|
Quote-like operators in a hash
on Jan 05, 2009 at 11:10
|
1 direct reply
|
by tux402
|
|
|
I have many MS Sql queries in a hash that I am looping on. What I think I need to do is to be able to use quote like operators in the hash value. Here is a sample hash,
'Calls Received'=>"SELECT COUNT(DISTINCT sessionID ) FROM AgentConnect
+ionDetail WHERE startDateTime BETWEEN '$yesterday' AND '$today'"
The $yesterday and $today variables are filled in correctly with a datetime format.
When I loop on the hash, and execute the value as a sql query, the return I get is either a 1 or a 0. Even when I had the sql queries in an array, the return was almost always a 1 or a 0 and not the correct sql query response. I do know that the actual query is correct. I tried doing,
'Calls Received'=>qq{"SELECT COUNT(DISTINCT sessionID ) FROM AgentConn
+ectionDetail WHERE startDateTime BETWEEN '$yesterday' AND '$today'"}
but that didn't seem to work either. I got the error,
DBD::ODBC::db prepare failed: [FreeTDS][SQL Server]Statement(s) could
+not be prepared. (SQL-42000)
[FreeTDS][SQL Server]The identifier that starts with 'SELECT COUNT(DIS
+TINCT sessionID ) FROM AgentConnectionDetail WHERE startDateTime BETW
+EEN '2009-01-04 15:41:51.000' AND '2009-01-' is too long. Maximum len
+gth is 128. (SQL-42000) at ./voip.pl line 75.
Can't call method "execute" on an undefined value at ./voip.pl line 76
+.
Any ideas?
|
[Offer your reply]
|
Concatenate printing a string and array
on Jan 05, 2009 at 00:04
|
6 direct replies
|
by spickles
|
|
|
I know of the differences between
print @array and print "@array"
I've written a decimal to binary conversion program, and I need to print the array. So I'd like to print something like:
"Your result is: 1111"
I've tried
print "Your result is: " . @array
The output of that is "Your result is: 4" I'm assuming it is because the array is getting converted to a string automatically. How can I achieve the output I'm looking for?
Regards,
Scott
|
[Offer your reply]
|
Apache::Session help storing a complex data structure
on Jan 04, 2009 at 22:19
|
3 direct replies
|
by B-A-T
|
|
|
Hi,
I am trying to store a data structure in using Apache::Session::MySQL. The session creation and retrieval works fine for scalar values,HoH or Array etc, but the session is not written if i try to store a tree data structure (N-ary tree http://search.cpan.org/~rkinyon/Tree-1.01/lib/Tree.pm) The code is as follows:
--
my %session;
tie %session,'Apache::Session::MySQL', undef, {
DataSource => 'dbi:mysql:sessions',
UserName => 'login',
Password => 'pass',
LockDataSource => 'dbi:mysql:sessions',
LockUserName => 'login',
LockPassword => 'pass' };
my $id = $session{_session_id};
$session{TIMESTAMP} =time;
$session{tree} = $treeP; ## $treeP is a N-ary Tree
$session{uname} = $uname;
untie(%session) || die " Cant untie! \n";
##-- in another code
my %session1;
tie %session1,'Apache::Session::MySQL', undef, {
DataSource => 'dbi:mysql:sessions',
UserName => 'login',
Password => 'pass',
LockDataSource => 'dbi:mysql:sessions',
LockUserName => 'login',
LockPassword => 'pass' };
my $u = $session1{uname}; ## this value is undef if I try to store the
+ tree as well in the above code
print STDERR "name $u \n";
untie(%session1) or die "CANT CLOSE!! \n";
========
Can anyone help?
Thanks in advance...
cheers!
|
[Offer your reply]
|
Using substr to split a string scales as N^2 -- alternatives?
on Jan 04, 2009 at 19:02
|
3 direct replies
|
by hrr
|
|
|
I would like to process a lengthy (binary) string of length N as a stream, chopping off the front part as the parsing goes along.
my $n = unpack("v", $str);
$str = substr($str, 1);
Now I have noted that my application considerably slows down on longer strings, and it seems that is due to the following effect, indicating that this approach scales as N^2 (instead of the naively expected N).
$ time perl -e '$a = "a" x 20_000; $a = substr($a, 1) while length($a)
+;'
user 0m0.350s
$ time perl -e '$a = "a" x 40_000; $a = substr($a, 1) while length($a)
+;'
user 0m0.791s
$ time perl -e '$a = "a" x 80_000; $a = substr($a, 1) while length($a)
+;'
user 0m2.683s
A similar issue was discussed in Access via substr refs 2000 times slower: I think the reason for this slowdown is that when constructing support for the lvalue property of substr, a copy is made.
Is there a good workaround to this scaling issue? Instead of passing an ever-shrinking stream around different functions, I could just pass the whole string and an offset,
my $n = unpack("v", substr($str, $off, 2));
$off += 2;
but this seems rather clumsy to me... Are there more elegant ways of doing this?
Update: Thanks for all the help and for the insightful explanations! As suggested, the 4-argument version of substr (setting REPLACEMENT to "") instead of the (2|3)-argument substr solves this in a very neat manner,
my $n = unpack("v", substr($str, $off, 2, ""));
|
[Offer your reply]
|
Files not altered
on Jan 04, 2009 at 09:18
|
4 direct replies
|
by bluethundr
|
|
|
Hello Monks!
Finally had an occasion to use perl for something other than to complete an exercise in the llama! Still making newbish mistakes though.
The idea here is to parse some config files that reveal my database password and replace the real password with a fake password. Also does the same to hide the name of the database user. I call the program "cleaner". :)
I think I'm close to getting it to work. It does alter the username and password in the output. But when I examine the original files with cat the file is unchanged.
I think I need to select the file for output, but I am a little unsure of the usage. I'd appreciate any advice you might have.
Here's the program in question:
#!/usr/bin/perl -w
use strict;
foreach (@ARGV) {
while (<>) {
s/\s+myOldPass/ Onl33In05h15notmyreallpass/g;
s/\s+myDBUser/ my-secret-db-user/g;
print;
}
}
My ultimate goal is to be able to safely pastebin my config files to IRC so that some postfix genius can help me figure out why my postfix setup is not talking to MySQL.
Thanks guys! Very excited to be using Perl for a real world application for the first time!
|
[Offer your reply]
|
Regex Question
on Jan 04, 2009 at 01:06
|
4 direct replies
|
by Anonymous Monk
|
|
|
I have three types of URL's:
http://www.url.com
http://www.url.com/
http://www.url.com/cgi.pl?x=y
I want to strip off the stuff that isn't the domain name. So the above URL's would become:
http://www.url.com or http://www.url.com/
* Note I want the trailing "/" if it's after the domain name
This is my stab at it:
#!/usr/bin/perl -w
use strict;
my $displayed_link = 'http://www.url.com/cgi.pl?x=y';
( $displayed_link ) = $displayed_link =~ m/^(http:\/\/.*?\/)/;
print $displayed_link . "\n";
Notice that I get the URL out, but if it's just a plaint http://www.url.com without a trailing slash it gets an error.#!/usr/bin/perl -w
use strict;
my $displayed_link = 'http://www.url.com';
( $displayed_link ) = $displayed_link =~ m/^(http:\/\/.*?\/)/;
print $displayed_link . "\n";
I've tried a bunch of ways to have it come out with "http:www.url.com" or "http://www.url.com/" regardless of the input but can't figure it out.
What's the proper way to do this?
|
[Offer your reply]
|
Mysteries of unpack("a", ...)
on Jan 03, 2009 at 00:31
|
4 direct replies
|
by pspinler
|
|
|
Hi:
Quick summary: what does unpack ("a4",...) do in comparison to unpack ("L", ...) ?
I have data from a foreign system (IBM z/VM performance history log data) that comes in 1468 byte records, with a mix of EBCDIC encoded characters and numbers in binary format (mostly IBM 390 E format 4 byte floats).
I'm using the following to read records,
binmode (STDIN);
local $/ = undef;
while (read (STDIN, $record, 1468)) {
my $parsed = &decode_record ($record);
print_record ($parsed);
}
my sub decode_record has lots of tidbits like this (multiple calls to unpack() only for my own clarity, until I get this reliably working)
sub decode_record ($) {
my $record = shift;
my %rec;
$rec{"date"} = unpack ("a8", $record);
$rec{"time"} = unpack ("x8a8", $record);
(snip)
$rec{"el_time"} = unpack ("x48a4", $record);
$rec{"samples"} = unpack ("x52a4", $record);
(snip)
return \%rec;
}
My problem is I'm having problems dealing with those a4 fields I'm unpack()ing, each of them being one of those IBM E format 4 byte floats I mentioned. I'm calling this routine to attempt to parse 'em:
sub parse_E ($) {
my $data = shift;
my ($sign, $characteristic, $fraction);
$sign = ($data & 0x80000000) ? -1 : 1;
$characteristic = (($data >> 24) & 0x7f) - 64;
$fraction = (($data & 0x00ffffff) / 0xffffff) * 16;
my $num = $sign * $fraction ** $characteristic;
printf("DEBUG: parse_E(%32s)\n\tsign: %d charisteristic: %d ".
"fraction: %f = %f\n",
unpack ("B32", $data), $sign, $characteristic,
$fraction, $num);
printf("DEBUG: unpacked characteristic %s\n",
unpack ("B7", ($data >> 24) & 0x7f));
printf("DEBUG: unpacked fraction %7s%s\n", " ",
unpack ("B24", $data & 0x00ffffff));
return $num;
}
The thing is, I'm getting results like this, which indicates that I don't know what the floop unpack("a4") does. In particular, notice the error messages "isn't numeric" and also the debugging bitstring prints from my bit twiddling, which should result in 7 bits of data, bits 30-25, and and 24 bits of data, bits 23-0. Instead I appear to be getting 7 bits and 8 bits, and they don't appear to match the passed in bitstring in any way.
Argument "B<\0\0" isn't numeric in bitwise and (&) at ./testparse.pl l
+ine 47.
DEBUG: parse_E(01000010001111000000000000000000)
sign: 1 charisteristic: -64 fraction: 0.000000 = -inf
DEBUG: unpacked characteristic 0011000
DEBUG: unpacked fraction 00110000
The 'line 47' in that error message happens to be the first binary operation on the data, '$data & 0x80000000'.
But, if I change unpack ("a4", ...) to unpack ("L"), then I get this, instead:
DEBUG: parse_E(00110001001100010011000100110001)
sign: 1 charisteristic: 2 fraction: 3.750000 = 14.062502
DEBUG: unpacked characteristic 0011011
DEBUG: unpacked fraction 001100110011100100110011
I'm still doing something wrong here, since my 7bit and 24 bit bitstrings still don't match bits 30-24 and bits 23-0 in the raw data, but suddenly I stop getting the "not numeric" error message and actually see the proper length of bitstrings if not the proper data.
Would some kind soul please enlighten my stumblings?
Thanks!
-- Pat
|
[Offer your reply]
|
Extracting a (UK) Address
on Jan 02, 2009 at 05:14
|
4 direct replies
|
by ropey
|
|
|
Fellow Monks,
I am faced with a challenge to extract clients names and addresses from a bunch of Word documents
I came to the conclusion that processing raw text would be easier than trying to parse a word formatted document, so using Win32::OLE I open the documents and save them as text only, however now I come to the part of extracting the address data from it and before I start would ask for some advice
So has anyone done something similar to this before ? the obvious choice would be a regex, but given that the format of a name and address could vary considerably (consider MR and Mrs D.M Smith, Mrs & Mr D Smith-Brown etc) and an address could vary even more, so before I re-invent the wheel, has this been done before ? searching CPAN there are modules such as Geo::PostalAddress or Lingua::EN::AddressParse which do something similar, but do not 'extract' the address from a raw text document ?
Has anyone faced a similar problem ? and could advice on how to resolve ?
|
[Offer your reply]
|
Removing first part of string with regex
on Jan 02, 2009 at 04:59
|
9 direct replies
|
by Anonymous Monk
|
|
|
Dear all,
I have the following strings sample:
human.NT_113898
human.contig.1
human.2
human.IV
What I want to do is to remove word up to first period, and capturing
the all after first period. Yielding:
NT_113898
contig.1
2
IV
How come my regex below doesn't work:
/\.?(\S+)/;
print "$1\n";
What's the right way to do it?
|
[Offer your reply]
|
How much is Perl6 the community rewrite of Perl?
on Jan 01, 2009 at 10:40
|
8 direct replies
|
by zby
|
|
|
The initial plan from Larry Wall was:
Perl 5 was my rewrite of Perl. I want Perl 6 to be the community's rewrite of Perl and of the community.
How does that stand against the reality and 8 years of development? I think I need to add that I don't want to start a flamewar here - but I am just curious. It seems like a natural evolution from the simple structure of Larry authoritarian rule to more complex and (at least partially) democratic organisation. It clearly was much more difficult than anyone expected - and thus it can be an intersting datapoint for Open Source theory (to contrast it with: Open Source Projects Manage Themselves? Dream On.).
|
[Offer your reply]
|
|
|
New Meditations
|
RFC: Array::GroupBy
on Jan 05, 2009 at 22:19
|
4 direct replies
|
by kyle
|
|
|
After reading of bradcathey's plight in Ways to group elements of an AoH, I thought this task might be worth wrapping up in a CPAN module, so I wrote Array::GroupBy. I present it here for the consideration of the monks so that I may grow from your wisdom, creativity, and tomfoolery.
Things that particularly concern me:
- The name. Having written Array::GroupBy, I think it should be called Array::Grouper. Then again, maybe "array" isn't right since it operates more specifically on an AoH, and maybe the monks have an even better idea.
- The documentation. In my brief search for a way to describe what GROUP BY does, one of the better "descriptions" said it's hard to describe and it's best to learn by example. I punted on that, and when writing the docs I often got the feeling I wasn't describing things very well. Any direction here would be appreciated.
- My Moose use. This is the first thing I've done with Moose besides play.
Thank you for your thoughts!
|
[Offer your reply]
|
GET -USsexd no more
on Jan 02, 2009 at 18:57
|
0 direct replies
|
by Anonymous Monk
|
|
|
The following no longer works, LWP::Debug is deprecated
GET -USsexd
lwp-request -m get -USsexd
Weep
|
[Offer your reply]
|
Stories from the front
on Jan 02, 2009 at 16:05
|
2 direct replies
|
by talexb
|
|
|
I don't have a blog. When I have something to write, it pretty much ends up on Perlmonks. This is (as the title suggests) more story than meditation. Maybe what I've uncovered is obvious, maybe not. I just hope it will help someone out.
I'm developing for the Solaris (Sparc) OS. Mostly things are the same, but sometimes they aren't. When I was having problems getting https://localhost/ to work, I tried using openssl to get in:
openssl s_client -connect localhost:443
No dice -- got an error right away. Eventually, I discovered that
openssl s_client -connect 10.1.1.161:443
worked instead. I'm sure someone can explain that -- I can't.
Next, I was trying to run a test script to hit the local webserver. Since I'd already discovered localhost wouldn't do that, I was using the IP address, which worked fine until there was a re-direct, at which point things just died quietly (i.e., nothing obvious in the web logs).
So then I tried using a browser (with the IP address, not the name) to check that things worked, and discovered that Firefox complained that the SSL certificate I had installed on this (virtual) box didn't match the IP address I was trying to use. So I put the name of the host into my test script, and was finally able to log in.
So my test script is fairly simple: try to log in as a variety of different users, each with a different set of privileges, and confirm that the user can see the links that they're allowed to see, and if the link does exist, try to follow it, making sure that a valid page (HTTP code is 200) gets returned.
Since some of the links belong to packages that aren't installed yet, I expect some of them to fail, so that part of the code is inside a SKIP block. This worked fine when I ran the script from my Ubuntu box, hitting the Solaris box, but now I'm running from within Solaris, as soon as a non-existent page is requested, the test script dies, and I'm not sure why.
The links are in a table, so I've just commented those elements out, and now my test script is passing OK -- I guess I need to go back to the docs and read them again.
Sometimes development is maddening like that.
Update: I've started a SoPW question to follow up on my last point.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
|
[Offer your reply]
|
IO::Lambda: call for participation
on Jan 01, 2009 at 17:50
|
2 direct replies
|
by dk
|
|
|
Hello everyone!
During the last year, I was busy writing an async I/O module, IO::Lambda, which
I believe does the task of expressing callback-based I/O logic much more
elegantly than it was ever done before, by using a different concept. The
module includes async versions of DNS, SNMP, HTTP, and DBI (the cool part about
DBI is that it can work asynchronously by using either forks, threads, or even
a socket connection). It's a lot, but not as much a lot as I think I need. I'm planning to use IO::Lambda to write a
new separate module for httpd, and possibly modules for ftp and irc too, but I
don't have enough time for it all. Also I find it hard to determine the right
balance for httpd, where the module ends, and where a httpd application begins.
This is also a help call: if anyone wants to write new modules, or contribute to the development of the existing ones, please
volunteer, that would be really greatly appreciated.
If you don't know where to start, there's documentation
and examples.
There's also a mailing list at io-lambda-general at lists.sourceforge.net,
and in realtime I'm McFist on #perl.
Thanks!
|
[Offer your reply]
|
|
|
New Cool Uses for Perl
|
yet another time module
on Jan 05, 2009 at 21:41
|
2 direct replies
|
by shmem
|
|
|
Well, again... a silly use for perl. Couldn't resist...
Nah, don't upvote PLZ. The merits go to thezip, again, for his CB entry
my @T = lolcatime(time);
so grab one of his nodes to upvote. Ready? ok...
package lolcatime;
require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(lolcatime localtime);
*localtime = \&lolcatime;
sub lolcatime {
my $foo;
return wantarray
? CORE::localtime(shift||time)
: uc((($foo=scalar(CORE::localtime(shift||time)))=~s/:(\d+) / /)
&& "$foo - i can haz ur $1 seconz?");
}
1;
perl -Mlolcatime -le 'print scalar lolcatime time'
TUE JAN 6 03:41 2009 - I CAN HAZ UR 31 SECONZ?
kthxbye
--shmem
update: changed title, added sample usage
|
[Offer your reply]
|
Earthquakes in three dimensions
on Jan 01, 2009 at 16:17
|
4 direct replies
|
by snowhare
|
|
|
With the annual 'OMG there are earthquakes in Yellowstone and we are all going to die' news cycle in progress, I looked at the USGS web page for the earthquake swarm. The page was singularly uninformative because the swarm is very localized and you can't really tell what was going on from the picture.
But they also have the raw data that the map is generated from available. And I thought "Hey, I'll bet I could generate a 3D scattergram of that data."
And thanks to Perl and gnuplot, I could. More, I could even animate it. :)
This will only work on *nix style systems with gnuplot installed.
Read more... Charting Earthquakes in three dimensions (15 kB)
|
[Offer your reply]
|
|
|
|